Topic-Specific YouTube Crawling to Detect Online Radicalization
نویسندگان
چکیده
Online video sharing platforms such as YouTube contains several videos and users promoting hate and extremism. Due to low barrier to publication and anonymity, YouTube is misused as a platform by some users and communities to post negative videos disseminating hatred against a particular religion, country or person. We formulate the problem of identification of such malicious videos as a search problem and present a focused-crawler based approach consisting of various components performing several tasks: search strategy or algorithm, node similarity computation metric, learning from exemplary profiles serving as training data, stopping criterion, node classifier and queue manager. We implement two versions of the focused crawler: best-first search and shark search. We conduct a series of experiments by varying the seed, number of n-grams in the language model based comparer, similarity threshold for the classifier and present the results of the experiments using standard Information Retrieval metrics such as precision, recall and F-measure. The accuracy of the proposed solution on the sample dataset is 69% and 74% for the best-first and shark search respectively. We perform characterization study (by manual and visual inspection) of the anti-India hate and extremism promoting videos retrieved by the focused crawler based on terms present in the title of the videos, YouTube category, average length of videos, content focus and target audience. We present the result of applying Social Network Analysis based measures to extract communities and identify core and influential users.
منابع مشابه
Solutions to Detect and Analyze Online Radicalization : A Survey
Online Radicalization (also called Cyber-Terrorism or Extremism or Cyber-Racism or CyberHate) is widespread and has become a major and growing concern to the society, governments and law enforcement agencies around the world. Research shows that various platforms on the Internet (low barrier to publish content, allows anonymity, provides exposure to millions of users and a potential of a very q...
متن کاملApplying Social Media Intelligence for Predicting and Identifying On-line Radicalization and Civil Unrest Oriented Threats
Research shows that various social media platforms on Internet such as Twitter, Tumblr (micro-blogging websites), Facebook (a popular social networking website), YouTube (largest video sharing and hosting website), Blogs and discussion forums are being misused by extremist groups for spreading their beliefs and ideologies, promoting radicalization, recruiting members and creating online virtual...
متن کاملLearnable Crawling: An Efficient Approach to Topic-specific Web Resource Discovery
The rapid growth of the Internet has put us into trouble when we need to find information in such a large network of databases. At present, using topic-specific web crawler becomes a way to seek the needed information. The main characteristic of a topic-specific web crawler is to select and retrieve only relevant web pages in each crawling process. There are many previous researches focusing on...
متن کاملTowards a Framework for Web 2.0 Community Success: A Case of YouTube
Although ample research has been conducted on the topic of community, there is still much research to be done on online communities. More specifically, there is a paucity of research on the topic of building successful Web 2.0 communities like YouTube—the top ranked Web 2.0 video sharing website. In this paper, a framework for Web 2.0 community success is proposed based on a theoretical review ...
متن کاملLearnable Topic-specific Web Crawler
Topic-specific web crawler collects relevant web pages of interested topics from the Internet. There are many previous researches focusing on algorithms of web page crawling. The main purpose of those algorithms is to gather as many relevant web pages as possible, and most of them only detail the approaches of the first crawling. However, no one has ever mentioned some important questions, such...
متن کامل